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Abstract 


The standards for Internationalized Domain Names in Applications (IDNA) require a review of 
each new version of Unicode to determine whether incompatibilities with prior versions or other 
issues exist and, where appropriate, to allow the IETF to decide on the trade-offs between 
compatibility with prior IDNA versions and compatibility with Unicode going forward. That 
requirement, and its relationship to tables maintained by IANA, has caused significant confusion 
in the past. This document makes adjustments to the review procedure based on experience and 
updates IDNA, specifically RFC 5892, to reflect those changes and to clarify the various 
relationships involved. It also makes other minor adjustments to align that document with 
experience. 
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1. Introduction 


The standards for Internationalized Domain Names in Applications (IDNA) require a review of 
each new version of Unicode to determine whether incompatibilities with prior versions or other 
issues exist and, where appropriate, to allow the IETF to decide on the trade-offs between 
compatibility with prior IDNA versions and compatibility with Unicode [Unicode] going forward. 
That requirement, and its relationship to tables maintained by IANA, has caused significant 
confusion in the past (see Section 3 and Section 4 for additional discussion of the question of 
appropriate decisions and the history of these reviews). This document makes adjustments to the 
review procedure based on nearly a decade of experience and updates IDNA, specifically the 
document that specifies the relationship between Unicode code points and IDNA derived 
properties [RFC5892], to reflect those changes and to clarify the various relationships involved. 


This specification does not change the requirement that registries at all levels of the DNS tree 
take responsibility for the labels they insert in the DNS, a level of responsibility that requires 

allowing only a subset of the code points and strings allowed by the IDNA protocol itself. That 
requirement is discussed in more detail in a companion document [RegRestr]. 


Terminology note: In this document, "IDNA" refers to the current version as described in RFC 
5890 [RFC5890] and subsequent documents and sometimes known as "IDNA2008". Distinctions 
between it and the earlier version are explicit only where they are necessary for understanding 
the relationships involved, e.g., in Section 2. 


2. Brief History of IDNA Versions, the Review Requirement, 
and RFC 5982 


The original, now-obsolete, version of IDNA, commonly known as "IDNA2003" [RFC3490] 
[RFC3491], was defined in terms of a profile of a collection of IETF-specific tables [RFC3454] that 
specified the usability of each Unicode code point with IDNA. Because the tables themselves were 
normative, they were intrinsically tied to a particular version of Unicode. As Unicode evolved, 
the IDNA2003 standard would have required the creation of a new profile for each new version 
of Unicode, or the tables would have fallen further and further behind. 


When IDNA2003 was superseded by the current version, known as IDNA2008 [RFC5890], a 
different strategy, one that was property-based rather than table-based, was adopted for a 
number of reasons, of which the reliance on normative tables was not dominant [RFC4690]. In 
the IDNA2008 model, the use of normative tables was replaced by a set of procedures and rules 
that operated on Unicode properties [Unicode-properties] and a few internal definitions to 
determine the category and status, and hence an IDNA-specific "derived property", for any given 
code point. Those rules are, in principle, independent of Unicode versions. They can be applied to 
any version of Unicode, at least from approximately version 5.0 forward, to yield an appropriate 
set of derived properties. However, the working group that defined IDNA2008 recognized that 
not all of the Unicode properties were completely stable and that, because the criteria for new 
code points and property assignment used by the Unicode Consortium might not precisely align 
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with the needs of IDNA, there were possibilities of incompatible changes to the derived property 
values. More specifically, there could be changes that would make previously disallowed labels 
valid, previously valid labels disallowed, or that would be disruptive to IDNA's defining rule 
structure. Consequently, IDNA2008 provided for an expert review of each new version of 
Unicode with the possibility of providing exceptions to the rules for particular new code points, 
code points whose properties had changed, and newly discovered issues with the IDNA2008 
collection of rules. When problems were identified, the reviewer was expected to notify the IESG. 
The assumption was that the IETF would review the situation and modify IDNA2008 as needed, 
most likely by adding exceptions to preserve backward compatibility (see Section 3.1). 


For the convenience of the community, IDNA2008 also provided that IANA would maintain 
copies of calculated tables resulting from each review, showing the derived properties for each 
code point. Those tables were expected to be helpful, especially to those without the facilities to 
easily compute derived properties themselves. Experience with the community and those tables 
has shown that they have been confused with the normative tables of IDNA2003: the IDNA2008 
tables published by IANA have never been normative, and statements about IDNA2008 being out 
of date with regard to some Unicode version because the IANA tables have not been updated are 
incorrect or meaningless. 


3. The Review Model 


While the text has sometimes been interpreted differently, IDNA2008 actually calls for two types 
of review when a new Unicode version is introduced. One is an algorithmic comparison of the set 
of derived properties calculated from the new version of Unicode to the derived properties 
calculated from the previous one to determine whether incompatible changes have occurred. 
The other is a review of newly assigned code points to determine whether any of them require 
special treatment (e.g., assignment of what IDNA2008 calls contextual rules) and whether any of 
them violate any of the assumptions underlying the IDNA2008 derived property calculations. 
Any of the cases of either review might require either per-code point exceptions or other 
adjustments to the rules for deriving properties that are part of RFC 5892. The subsections below 
provide a revised specification for the review procedure. 


Unless the IESG or the designated expert team concludes that there are special problems or 
unusual circumstances, these reviews will be performed only for major Unicode versions (those 
numbered NN.O, e.g., 12.0) and not for minor updates (e.g., 12.1). 


As can be seen in the detailed descriptions in the following subsections, proper review will 
require a team of experts that has both broad and specific skills in reviewing Unicode characters 
and their properties in relation to both the written standards and operational needs. The IESG 
will need to appoint experts who can draw on the broader community to obtain the necessary 
skills for particular situations. See the IANA Considerations (Section 7) for details. 
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3.1. Review Model Part I: Algorithmic Comparison 


Section 5.1 of RFC 5892 is the description of the process for creating the initial IANA tables. It is 
noteworthy that, while it can be read as strongly implying new reviews and new tables for 
versions of Unicode after 5.2, it does not explicitly specify those reviews or, e.g., the timetable for 
completing them. It also indicates that incompatibilities are to be "flagged for the IESG" but does 
not specify exactly what the IESG is to do about them and when. For reasons related to the other 
type of review and discussed below, only one review was completed, documented [RFC6452], and 
a set of corresponding new tables installed. That review, which was for Unicode 6.0, found only 
three incompatibilities; the consensus was to ignore them (not create exceptions in IDNA2008) 
and to remain consistent with computations based on current (Unicode 6.0) properties rather 
than preserving backward compatibility within IDNA. The 2018 review (for Unicode 11.0 and 
versions in between it and 6.0) [IDNA-Unicode12] also concluded that Unicode compatibility, 
rather than IDNA backward compatibility, should be maintained. That decision was partially 
driven by the long period between reviews and the concern that table calculations by others in 
the interim could result in unexpected incompatibilities if derived property definitions were 
then changed. See Section 4 for further discussion of these preferences. 


3.2. Review Model Part II: New Code Point Analysis 


The second type of review, which is not clearly explained in RFC 5892, is intended to identify 
cases in which newly added or recently discovered problematic code points violate the design 
assumptions of IDNA, to identify defects in those assumptions, or to identify inconsistencies 
(from an IDNA perspective) with Unicode commitments about assignment, properties, and 
stability of newly added code points. One example of this type of review was the discovery of 
new code points after Unicode 7.0 that were potentially visually equivalent, in the same script, to 
previously available code point sequences [IAB-Unicode7-2015] [IDNA-Unicode’]. 


Because multiple perspectives on Unicode and writing systems are required, this review will not 
be successful unless it is done by a team. Finding one all-knowing expert is improbable, and a 
single expert is unlikely to produce an adequate analysis. Rather than any single expert being the 
sole source of analysis, the designated expert (DE) team needs to understand that there will 
always be gaps in their knowledge, to know what they don't know, and to work to find the 
expertise that each review requires. It is also important that the DE team maintains close contact 
with the Area Directors (ADs) and that the ADs remain aware of the team's changing needs, 
examining and adjusting the team's membership over time, with periodic reexamination at least 
annually. It should also be recognized that, if this review identifies a problem, that problem is 
likely to be complex and/or involve multiple trade-offs. Actions to deal with it are likely to be 
disruptive (although perhaps not to large communities of users), or to leave security risks 
(opportunities for attacks and inadvertent confusion as expected matches do not occur), or to 
cause excessive reliance on registries understanding and taking responsibility for what they are 
registering [RFC5894] [RegRestr]. The latter, while a requirement of IDNA, has often not worked 
out well in the past. 
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Because resolution of problems identified by this part of the review may take some time even if 
that resolution is to add additional contextual rules or to disallow one or more code points, there 
will be cases in which it will be appropriate to publish the results of the algorithmic review and 
to provide IANA with corresponding tables, with warnings about code points whose status is 
uncertain until there is IETF consensus about how to proceed. The affected code points should be 
considered unsafe and identified as "under review" in the IANA tables until final derived 
properties are assigned. 


4. IDNA Assumptions and Current Practice 


At the time the IDNA2008 documents were written, the assumption was that, if new versions of 
Unicode introduced incompatible changes, the Standard would be updated to preserve backward 
compatibility for users of IDNA. For most purposes, this would be done by adding to the table of 
exceptions associated with Rule G [RFC5892a]. 


This has not been the practice in the reviews completed subsequent to Unicode 5.2, as discussed 
in Section 3. Incompatibilities were identified in Unicode 6.0 [RFC6452] and in the cumulative 
review leading to tables for Unicode 11.0 [IDNA-Unicode11]. In all of those cases, the decision 
was made to maintain compatibility with Unicode properties rather than with prior versions of 
IDNA. 


If an algorithmic review detects changes in Unicode after version 12.0 that would break 
compatibility with derived properties associated with prior versions of Unicode or changes that 
would preserve compatibility within IDNA at the cost of departing from current Unicode 
specifications, those changes must be captured in documents expected to be published as 
Standards Track RFCs so that the IETF can review those changes and maintain a historical 
record. 


The community has now made decisions and updated tables for Unicode 6.0 [RFC6452], done 
catch-up work between it and Unicode 11.0 [IDNA-Unicode11], and completed the review and 
tables for Unicode 12.0 [IDNA-Unicode12]. The decisions made in those cases were driven by 
preserving consistency with Unicode and Unicode property changes for reasons most clearly 
explained by the IAB [IAB-Unicode-2018]. These actions were not only at variance with the 
language in RFC 5892 but were also inconsistent with commitments to the registry and user 
communities to ensure that IDN labels that were once valid under IDNA2008 would remain 
valid, and previously invalid labels would remain invalid, except for those labels that were 
invalid because they contained unassigned code points. 


This document restores and clarifies that original language and intent: absent extremely strong 
evidence on a per-code point basis that preserving the validity status of possible existing (or 
prohibited) labels would cause significant harm, Unicode changes that would affect IDNA 
derived properties are to be reflected in IDNA exceptions that preserve the status of those labels. 
There is one partial exception to this principle. If the new code point analysis (see Section 3.2) 
concludes that some code points or collections of code points should be further analyzed, those 
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code points, and labels including them, should be considered unsafe and used only with extreme 
caution because the conclusions of the analysis may change their derived property values and 
status. 


5. Derived Tables Published by IANA 


As discussed above, RFC 5892 specified that derived property tables be provided via an IANA 
registry. Perhaps because most IANA registries are considered normative and authoritative, that 
registry has been the source of considerable confusion, including the incorrect assumption that 
the absence of published tables for versions of Unicode later than 6.0 meant that IDNA could not 
be used with later versions. That position was raised in multiple ways, not all of them consistent, 
especially in the ICANN context [ICANN-LGR-SLA]. 


If the changes specified in this document are not successful in significantly mitigating the 
confusion about the status of the tables published by IANA, serious consideration should be 
given to eliminating those tables entirely. 


6. Editorial Clarification to RFC 5892 


This section updates RFC 5892 to provide fixes for known applicable errata and omissions. In 
particular, verified RFC Editor Erratum 3312 [Err3312] provides a clarification to Appendix A and 
A.1 in RFC 5892. That clarification is incorporated below. 


1. In Appendix A, add a new paragraph after the paragraph that begins "The code point...". The 
new paragraph should read: 


For the rule to be evaluated to True for the label, it MUST be evaluated separately for 
every occurrence of the code point in the label; each of those evaluations must 
result in True. 


2. In Appendix A.1, replace the "Rule Set" by 


Rule Set: 
False; 
If Canonical_Combining_Class(Before(cp)) .eq. Virama 
Then True; 
If cp .eq. \u200C And 
RegExpMatch( (Joining _Type:{L,D}) (Joining _Type:T)*cp 
(Joining_Type:T)*(Joining_Type:{R,D})) Then True; 
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7. IANA Considerations 


For the algorithmic review described in Section 3.1, the IESG is to appoint a designated expert 
[RFC8126] with appropriate expertise to conduct the review and to supply derived property 
tables to IANA. As provided in Section 5.2 of the Guidelines for Writing IANA Considerations 
[RFC8126], the designated expert is expected to consult additional sources of expertise as needed. 
For the code point review, the expertise will be supplied by an IESG-designated expert team as 
discussed in Section 3.2 and Appendix B. In both cases, the experts should draw on the expertise 
of other members of the community as needed. In particular, and especially if there is no overlap 
of the people holding the various roles, coordination with the IAB-appointed liaison to the 
Unicode Consortium will be essential to mitigate possible errors due to confusion. 


As discussed in Section 5, IANA has modified the IDNA tables collection [[ANA-IDNA-Tables] by 
identifying them clearly as non-normative, so that a "current" or "correct" version of those tables 
is not implied, and by pointing to this document for an explanation. IANA has published tables 
supplied by the IETF for all Unicode versions through 11.0, retaining all older versions and 
making them available. Newer tables will be constructed as specified in this document and then 
made available by IANA. IANA has changed the title of that registry from "IDNA Parameters", 
which is misleading, to "IDNA Rules and Derived Property Values". 


The "Note" in that registry says: 


IDNA does not require that applications and libraries, either for registration/storage or 
lookup, support any particular version of Unicode. Instead, they are required to use 
derived property values based on calculations associated with whatever version of 
Unicode they are using elsewhere in the application or library. For the convenience of 
application and library developers and others, the IETF has supplied, and IANA 
maintains, derived property tables for several version of Unicode as listed below. It 
should be stressed that these are not normative in that, in principle, an application can 
do its own calculations and these tables can change as IETF understanding evolves. By 
contrast, the list of code points requiring contextual rules and the associated rules are 
normative and should be treated as updates to the list in RFC 5892. 


As long as the intent is preserved, the text of that note may be changed in the future at IANA's 
discretion. 


IANA's attention is called to the introduction, in Section 3.2, of a temporary "under review" 
category to the PVALID, DISALLOWED, etc., entries in the tables. 
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8. Security Considerations 


Applying the procedures described in this document and understanding of the clarifications it 
provides should reduce confusion about IDNA requirements. Because past confusion has 
provided opportunities for bad behavior, the effect of these changes should improve Internet 
security to at least some small extent. 


Because of the preference to keep the derived property value stable (as specified in RFC 5892 and 
discussed in Section 4), the algorithm used to calculate those derived properties does change as 
explained in Section 3. If these changes are not taken into account, the derived property value 
will change, and the implications might have negative consequences, in some cases with security 
implications. For example, changes in the calculated derived property value for a code point 
from either DISALLOWED to PVALID or from PVALID to DISALLOWED can cause changes in label 
interpretation that would be visible and confusing to end users and might enable attacks. 
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Appendix A. Summary of Changes to RFC 5892 


Other than the editorial correction specified in Section 6, all of the changes in this document are 
concerned with the reviews for new versions of Unicode and with the IANA Considerations in 
Section 5 of [RFC5892], particularly Section 5.1 of [RFC5892]. Whether the changes are 
substantive or merely clarifications may be somewhat in the eye of the beholder, so the list 
below should not be assumed to be comprehensive. At a very high level, this document clarifies 
that two types of review were intended and separates them for clarity. This document also 
restores the original (but so far unobserved) default for actions when code point derived 
properties change. For this reason, this document effectively replaces Section 5.1 of [RFC5892] 
and adds or changes some text so that the replacement makes better sense. 


Changes or clarifications that may be considered important include: 


e Separated the new Unicode version review into two explicit parts and provided for different 
review methods and, potentially, asynchronous outcomes. 


e Specified a DE team, not a single designated expert, for the code point review. 


e Eliminated the de facto requirement for the (formerly single) designated expert to be the 
same person as the IAB's liaison to the Unicode Consortium, but called out the importance of 
coordination. 

e Created the "Status" field in the IANA tables to inform the community about specific 
potentially problematic code points. This change creates the ability to add information about 
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such code points before IETF review is completed instead of having the review process hold 
up the use of the new Unicode version. 


e In part because Unicode is now on a regular one-year cycle rather than producing major and 
minor versions as needed, to avoid overloading the IETF's internationalization resources, 
and to avoid generating and storing IANA tables for trivial changes (e.g., the single new code 
point in Unicode 12.1), the review procedure is applied only to major versions of Unicode 
unless exceptional circumstances arise and are identified. 


Appendix B. Background and Rationale for Expert Review 
Procedure for New Code Point Analysis 


The expert review procedure for new code point analysis described in Section 3.2 is somewhat 
unusual compared to the examples presented in the Guidelines for Writing IANA Considerations 
[RFC8126]. This appendix explains that choice and provides the background for it. 


Development of specifications to support use of languages and writing systems other than 
English (and Latin script) -- so-called "internationalization" or "i18n" -- has always been 
problematic in the IETF, especially when requirements go beyond simple coding of characters 
(e.g., RFC 3629 [RFC3629]) or simple identification of languages (e.g., RFC 3282 [RFC3282] and the 
earlier RFC 1766 [RFC1766]). A good deal of specialized knowledge is required, knowledge that 
comes from multiple fields and that requires multiple perspectives. The work is not obviously 
more complex than routing, especially if one assumes that routing work requires a solid 
foundation in graph theory or network optimization, or than security and cryptography, but 
people working in those areas are drawn to the IETF and people from the fields that bear on 
internationalization typically are not. 


As a result, we have often thought we understood a problem, generated a specification or set of 
specifications, but then have been surprised by unanticipated (by the IETF) issues. We then 
needed to tune and often revise our specification. The language tag work that started with RFC 
1766 is a good example of this: broader considerations and requirements led to later work and a 
much more complex and finer-grained system [RFC5646]. 


Work on IDNs further increased the difficulties because many of the decisions that led to the 
current version of IDNA require understanding the DNS, its constraints, and, to at least some 
extent, the commercial market of domain names, including various ICANN efforts. 


The net result of these factors is that it is extremely unlikely that the IESG will ever find a 
designated expert whose knowledge and understanding will include everything that is required. 


Consequently, Section 7 and other discussions in this document specify a DE team that is 
expected to have the broad perspective, expertise, and access to information and community in 
order to review new Unicode versions and to make consensus recommendations that will serve 
the Internet well. While we anticipate that the team will have one or more leaders, the structure 
of the team differs from the suggestions given in Section 5.2 of the Guidelines for Writing IANA 
Considerations [RFC8126] since neither the team's formation nor its consultation is left to the 
discretion of the designated expert, nor is the designated expert solely accountable to the 
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community. A team that contains multiple perspectives is required, the team members are 
accountable as a group, and any nontrivial recommendations require team consensus. This also 
differs from the common practice in the IETF of "review teams" from which a single member is 
selected to perform a review: the principle for these reviews is team effort. 
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